Statistical analysis of the spatial distribution of operons in the transcriptional 

regulation network of Escherichia coli 
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We have performed a statistical analysis of the spatial 
distribution of operons in the transcriptional regulation 
network of Escherichia coli. The analysis reveals that 
operons that regulate each other and operons that are 
coregulated tend to lie next to each other on the genome. 
Moreover, these pairs of operons tend to be transcribed in 
diverging directions. This spatial arrangement of operons 
allows the upstream regulatory regions to interfere with 
each other. This affords additional regulatory control, as 
illustrated by a mean-field analysis of a feed-forward loop. 
Our results suggest that regulatory control can provide 
a selection pressure that drives operons together in the 
course of evolution. 

Most, if not all, organisms can respond and adapt to 
a changing environment. To this end, they can detect, 
transmit, and amplify environmental signals, as well as 
integrate different signals to perform computations anal- 
ogous to electronic devices. Indeed, all organisms can be 
considered to be information processing machines. Yet, 
how the living cell accurately processes information, is 
still poorly understood. Recent technological develop- 
ments, however, have made it possible to acquire infor- 
mation on the regulatory architecture of the cell on a 
massive scale, and extensive databases are now available 
that catalog biochemical networks. This offers unprece- 
dented possibilities to unravel the design principles by 
which organisms process information. 

The current richness of genomic data surrounding Es- 
cherichia coli makes it no doubt one of the best charac- 
terized of all living organisms. The condensation of genes 
into operons and the organization of operons into the 
transcriptional regulation network are now well mapped, 
and this information has been used to investigate generic 
features such as the appearance of motifs in the transcrip- 
tional regulation network . Here, we present a study of 
the spatial organization of operons in the transcriptional 
regulation network of E. coli. Our analysis of the spa- 
tial distribution of operons provides two distinct advan- 
tages overprevious studies on the spatial distribution of 
genes 0, EL IH El 0, : firstly, it excludes correlations 
from genes that belong to the same operon. Secondly, 
and more importantly, by focusing on the higher-level 
organisation of operons into the transcriptional regula- 
tion network, the analysis allows us to elucidate spatial 
correlations associated with regulatory control, for in- 



stance, by identifying coregulated pairs of operons that 
are adjacent on the DNA. 

We find that there is a marked tendency for operons 
that are related to each other in the transcriptional reg- 
ulation network to be nearest neighbours, compared to 
networks in which operons are randomly assigned po- 
sitions on the DNA. Furthermore, the separations be- 
tween neighbour pairs have a strong bias towards short 
distances, which is most pronounced for pairs that are 
transcribed in diverging directions. In fact, our analy- 
sis identifies a new, spatial network motif that consists 
of pairs of overlapping operons - operons of which the 
upstream regulatory domains overlap. 

Several mechanisms could give rise to the observed dis- 
tributions. The strong bias towards short separations 
could be a result of the mechanisms by which genes and 
connections between genes arise and disappear during 
evolution. In contrast, it is also conceivable that there 
is a functional benefit for having certain operons close to 
each other. This would lead to a selection pressure for 
shorter separations between certain operons. We do not 
investigate these scenarios in detail here, but our data 
does support the latter scenario for the diverging neigh- 
bour pairs. In particular, we examine a network mo- 
tif that has been identified by our statistical analysis: a 
feed-forward loop in which the 'downstream' operons are 
transcribed in diverging directions. The analysis shows 
that overlapping regulatory domains for these operons 
can strongly enhance the response of the network. Hence, 
our results suggest that regulatory control can provide an 
evolutionary driving force for the formation of overlap- 
ping operons. 

Methods 

Our starting point is the transcriptional regulation net- 
work data compiled by Shen-Orr et al for their motif 
analysis Q . We have annotated their list of operons with 
the start- and end-points of the coding regions, extracted 
from the various databases of genomic information for E. 
coli We work with coding regions rather than 

promoters because the former are easy to identify and 
the distance between a promoter and the start of the 
coding region is small compared to the typical distances 
we consider here. The transcriptional regulation network 
contains 404 operons with 558 links. 

We focus on the statistics of the pair separations be- 
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FIG. 1: Schematic overview of the sets of network motifs 
studied in this paper. The full set is given by the ensem- 
ble of all possible operon pairs. We have split the full set 
into the following subsets: the ensemble of nearest neighbour 
pairs (NN), corresponding to regions I- VI; the set of pairs 
that are connected in the transcriptional regulation network 
(TRN), corresponding to regions III-X; the set of pairs that 
are coregulated by a distinct third operon (CR), correspond- 
ing to II+III+VI+VII+X+XI; the set of autoregulated pairs 
(AR), which is a proper subset of the TRN set (regions V- 
VIII; here, protein X only regulates operon Y and Z in re- 
gions VI and VII). The union of TRN and CR sets (subsets 
III+VI+VII+X) corresponds to feed-forward loops (FFLs), a 
network motif identified by Shen-Orr et al 0. 

tween operons which are related in the transcriptional 
regulation network by various definitions (see below), and 
on pairs of neighboring operons on the genome. We de- 
fine the pair separation to be the distance along the DNA 
between the coding regions (an alternative definition as 
the distance between the starting points for transcrip- 
tion was explored with similar results). One basic tool is 
the cumulative distribution function F(s) = L ds'P(s'), 
where P(s)ds is the probability that the distance between 
two operons along the DNA has a value between s and 
s + ds. The cumulative distribution function F(s) is used 
in preference to the probability distribution function P(s) 
because it is readily visualized even for sparse data sets, 
and does not need to be corrected if a logarithmic axis is 
used for s. 

We base our analysis on the operon pairs in three over- 
lapping sets: pairs of operons that are nearest neighbours 
(NN) on the DNA, pairs of operons in the transcriptional 
regulation network (TRN) , and pairs of operons that are 
coregulated (CR) by a third operon. In addition, we have 
also considered autoregulated (AR) pairs which are TRN 
pairs for which the controlling operon also regulates itself. 
The different sets and subsets are schematically indicated 
in Fig.^ and the sizes of the sets in Table [I] 

In order to determine the statistical significance of the 
different quantities for the E. coli network, we have cal- 
culated the corresponding expectation values for an en- 
semble of random networks. Since we are primarily inter- 
ested in the network motifs that arise due to the spatial 




FIG. 2: The transcriptional regulation network of E. coli 
shown as links between operons on the genome. Maps are 
shown for (a) the real E. coli network and (b) a representa- 
tive 'randomised' network with the same topology, but with 
a random permutation of the positions of the operons. The 
color code is: blue: s < lOkbp; green: lOkbp < s < 500 kbp; 
yellow: s > 500 kbp. Note the much greater prevalence of 
the 'short' distances in the real map (a) compared to the ran- 
domised map (b). 

organisation of the network and not in those that are a 
consequence of the topology of the network (which have 
already been identified by Shen-Orr et al Q), we define 
a random network to be a network with a connectivity 
of that of the E. coli network, but with a random assign- 
ment of operon positions and orientations on the genome. 
Hence, not only the NN set of the E. coli network and 
a random network are equal in size, also the sizes of the 
TRN, CR and AR sets in the E. coli network equal those 
in a random network, as the sizes of these sets are deter- 
mined by the topology of the network. In contrast, the 
sizes of the unions corresponding to regions II- VI (see 
Fig. ^| differ between the E. coli network and a random 
network, since they are determined by the spatial distri- 
bution of the operons. For the E. coli network, we can 
directly obtain the sizes of the respective unions from 
the various databases 0, . For the ensemble of ran- 
dom networks, we have calculated the expectation value 
for the number of pairs, M a , in the union formed by the 
overlap of the set of nearest neighbour pairs with set a 
(be it set TRN, AR or CR), using M a — pN a , where p is 
the probability that a randomly chosen pair is, in fact, a 
nearest neighbour pair and N a is the number of pairs in 
set a. We have verified these calculations by generating 
random networks and computing the quantities directly. 
The P-values reported are probabilities of finding a sub- 
set in the random network of at least the size as observed 
in E. coli, computed using the same statistical model. 

Results 

Transcriptional Regulation Network 

In fig. |2 we compare a 'map' of the real transcriptional 
regulation network to a map of a randomised version of 
the network; the random network has been obtained by 
randomly permuting the positions of the operons on the 
DNA, thus preserving the topology of the network. It is 
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FIG. 3: The cumulative probability distribution F(s) for the 
distances along the DNA between operons in the transcrip- 
tional regulatory network (TRN) of E. coli (solid line). The 
dashed line corresponds to the average of an ensemble of net- 
works that have been obtained by randomly permuting the 
positions of the operons, thus preserving the network topol- 
ogy; the grey area denotes the 98% confidence regime. The 
dashed line is indistinguishable from the distribution func- 
tion that is expected for a random network, which is given 
by F(s) = s/(Ldna/2). Note the significant 10-15% fraction 
with s < 1 kbp for the E. coli network. The dotted-dashed 
line corresponds to the TRN pairs that are not nearest neigh- 
bours. The mean lengths of genes and operons in the data 
set are 1.0 kbp and 2.1 kbp, respectively. 



seen that the real network exhibits a larger number of 
'short' (blue) links and a smaller number of 'long' (yel- 
low) links, as compared to the random network. This 
indicates that operons that regulate each other tend to 
lie closer to each other than can be expected for a random 
network. We can quantify this by calculating the distri- 
bution functions for the distances between the network 
pairs. Fig.|3|shows the cumulative disitribution functions 
for the E. coli network and for the ensemble of random 
networks. For the ensemble of random networks, we ex- 
pect that the separation between network pairs is uni- 
formly distributed between zero and Ldna/2 = 2.3 Mbp, 
where £ D na = 4.64 Mbp is the total length of the DNA. 
Fig-E]shows that this is indeed the case. For the real net- 
work, however, we find marked deviations from a uniform 
distribution. It is seen that up to w 200 bp, the cumu- 
lative distribution function is close to zero; this lower 
cut-off reflects the typical sizes of promoter regions for 
operons. However, after m 200 bp the cumulative distri- 
bution function of the E. coli network sharply increases 
by some 10-15%, until it follows a nearly uniform distri- 
bution from « 1 kbp upwards. 

What is the origin of the pronounced increase in F(s) 
at around 200 bp? Do operons that are linked in the tran- 
scriptional regulation network tend to be nearest neigh- 
bors? Table [I] shows that is indeed the case. In E. coli, 
55 out of the 497 transcriptional regulation network pairs 



are nearest neighbours, as compared to the average of 2.5 
in the ensemble of random networks. Moreover, these 
(NN,TRN) pairs tend to lie very close to each other: out 
of the 55 (NN,TRN) pairs, 44 are located within 500 bp 
from each other, which is much smaller than the mean 
spacing of 9.6 kbp between operons. Fig. [21 also shows 
that for the ensemble of transcriptional regulation net- 
work pairs that are not nearest neighbours, the cumula- 
tive distribution function is much closer to the average 
of the ensemble of random networks. This establishes 
that the bias towards short distances in the transcrip- 
tional regulation network is due to the tendency of net- 
work pairs to be nearest neighbours. 

Neighboring operons on the DNA 

The results on the separation statistics for the transcrip- 
tional regulation network motivated us to examine the 
nearest neighbour pairs in more detail. We did not only 
include pairs that constitute links in the transcriptional 
regulation network (regions III- VI) in Fig. but also 
pairs that are coregulated by a common transcription 
factor (regions II, III and VI in Fig. QJ. 

Table Q] shows the sizes of the unions formed by the 
overlap of, on the hand, the set of nearest neighbour 
(NN) pairs, and, on the other hand, the sets of (autoregu- 
latory) network pairs and coregulated pairs, respectively. 
It is seen that these unions are significantly larger than 
the corresponding unions in the ensemble of random net- 
works. Moreover, region I, corresponding to the ensem- 
ble of nearest neighbours that are neither coregulated nor 
form a link in the transcriptional regulation network, is 
smaller than the corresponding region in the ensemble 
of random networks (data not shown). Clearly, both 
operons that regulate each other and operons that are 
coregulated by a common transcription factor, tend to 
be nearest neighbours on the DNA. 

A spatial arrangement of operons in which adjacent 
operons are transcribed in diverging directions, allows the 
upstream regulatory regions to overlap. As we discuss in 
more detail below, this affords additional regulatory con- 
trol. We therefore addressed two questions: 1) Do oper- 
ons that are each other's nearest neighbours tend to be 
transcribed in diverging directions? 2) Do these nearest 
neighbours tend to lie relatively close to each other? 

Table H] shows the statistics for pairs of operons bro- 
ken down according to the relative direction of transcrip- 
tion. There are three classes: 'tandem' (both operons 
are transcribed in a common direction), 'converging' and 
'diverging'. If the three classes of orientation were pop- 
ulated in a random manner, one would expect the ratio 
converging : diverging : tandem = 1 : 1 : 2. Our analysis 
reveals that in region I and in regions VII - XI, the three 
classes are indeed populated in a nearly random manner 
(data not shown). In contrast, the orientation statistics 
for adjacent operons that are either coregulated or form 
a (autoregulatory) link in the transcriptional regulation 
network, are markedly different. These pairs tend to be 
transcribed in diverging directions. Moreover, this effect 
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Set 


Orientation 


E. coli 


all pairs 

Random P-value 


pairs (s < 500 bp) 
E. coli Random P-value 


Regulation Network (TRN) 


497 






45 






Corep-ulated (CR) 




4362 






97 






Autoregulated (AR) 




318 






23 






TRN & CR (ie FFls) 




42 






6 






AR & CR 




24 






5 






Nearest Neighbours (NN) 


Diverging 


103 






45 


4.8 


1Q -28 


(I-VI) 


Converging 


105 






13 


4.8 


IO -3 




Tandem 


188 






42 


9.6 


10 _M 


TRN & NN 


Diverging 


33 


0.63 


1Q -44 


27 


0.031 


1Q -69 


(III-VI) 


Converging 


3 


0.63 


10~ 2 


2 


0.031 


1Q -4 




Tandem 


19 


1.3 


10 -ie 


15 


0.061 


IO" 31 


CR & NN 


Diverging 


22 


5.5 


io- 7 


15 


0.27 


io- 21 


(II+III+VI) 


Converging 


12 


5.5 


IO -2 


2 


0.27 


10 -2 




Tandem 


26 


11. 


1Q -4 


8 


0.54 


io- 7 


AR & NN 


Diverging 


21 


0.40 


1Q -28 


18 


0.020 


1Q -47 


(V+VI) 


Converging 


1 


0.40 


0.3 


1 


0.020 


10 -2 




Tandem 


5 


0.81 


IO -3 


4 


0.040 


HT 7 


TRN & CR & NN 


Diverging 


6 


0.054 


10" 11 


4 


0.0026 


io- 12 


(III+VI) 


Converging 


1 


0.054 


IO -2 


1 


0.0026 


IO -3 




Tandem 


1 


0.11 


0.1 


1 


0.0052 


IO -2 



TABLE I: Sizes of (sub)sets shown in Fig.^for E. coli; the roman numerals between brackets refer to the regions in Fig.0 The 
quantities in the 'Random' column refer to averages in the ensemble of random networks as denned in Method. The P-values 
are probabilities that a quantity of at least the size as observed in the E. coli network can be found in a random network. The 
TRN & CR & NN (III+VI) is identical to the AR & CR & NN set (VI) (not shown) apart from the additional presence of one 
converging operon pair. 



is most pronounced if the operons lie very close to each 
other on the DNA. Table [I] shows that coregulated and 
(autoregulatory) network pairs that lie less than 500 bp 
apart from each other, are predominantly transcribed in 
diverging directions. 

To answer the second question, we have calculated the 
separation-distribution function F(s) for nearest neigh- 
bour pairs. If operons were distributed at random on 
the genome, one would expect the separation statistics 
to follow a Poisson distribution; this corresponds to a 
model in statistical physics known as a Tonks gas (llj |. 
Fig. 01 however, shows that the E. coli network exhibits 
strong deviations from Poisson statistics. In particular, 
it shows that a large number of links are distinctively 
short. This is most striking for operons that are tran- 
scribed in diverging directions, although it is also no- 
ticeable for operons that are transcribed in a common 
direction. About 45% of diverging pairs and 20% of tan- 
dem pairs are closer than 500 bp, and these fractions are 
much higher than would be expected for a random net- 
work (see also Table [fl . It appears that the transcrip- 
tional regulation network of E. coli has a large fraction 
of adjoining operons. Importantly, Fig.^J) indicates that 
most of these adjoining operons are, indeed, operons that 
either regulate each other or are controlled by a common 
transcription factor. 



As mentioned above, an arrangement in which neigh- 
boring operons are transcribed in diverging directions, 
allows the operator regions to interfere with each other. 
The occurrence of diverging neighbour pairs with opera- 
tor interference can be assessed by a careful examination 
of the EcoCyc database 0, 0] (for details see supple- 
mentary information). Of the 45 diverging neighbour 
pairs with s < 500 bp (see Table 0, 20 operon pairs have 
operator interference (there are, in fact, 3 examples of 
operator interference with s > 500 bp); 10 do not; and 
for the remaining 15, there is insufficient information on 
the promoter/operator regions to decide. We conclude 
that the presence of operator interference provides a ma- 
jor part of the explanation for the strong bias towards 
small separations for diverging pairs. 

Discussion 

The principal findings of our statistical analysis are: 1) 
pairs of operons that regulate each other and pairs of 
operons that are coregulated tend to be nearest neigh- 
bours; 2) these nearest neighbours tend to be transcribed 
in diverging directions; 3) the nearest neighbours' separa- 
tion statistics is strongly biased towards short distances. 
What could be the origin of this behavior? Two distinct 
scenarios can give rise to the observed separation statis- 
tics. In the first, the bias is a result of the mechanisms 
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FIG. 4: The cumulative distribution functions F(s) for near- 
est neighbour pairs (NN pairs) in the E. coli network: (a) split 
by relative orientation; and (b) split according to whether 
the neighbour pair is coregulated or in the transcriptional 
regulation network (regions II- VI), or not (region I). The 
dashed line is given by a Poisson distribution, F(s) = 1 — 
exp[— s/L ran d], where L ra nd = 9.6 kbp is the mean spacing 
between operons. 



by which new operons and new links between operons 
emerge and disappear in the course of evolution. In the 
second, it is a consequence of a functional benefit for hav- 
ing short separations between certain pairs of operons; in 
this scenario, there is a selection pressure towards shorter 
distances. It seems hard, if not impossible, to disentan- 
gle both mechanisms, although, in principle, there is a 
clear difference between the two. In the former scenario, 
newly emerged operons will drift apart and, as a func- 
tion of time, the distance between them will increase. In 
contrast, in the latter scenario, a selection pressure will 
drive and keep the operons together. 

Still, the question remains why it could be beneficial 
to have pairs of operons together. It is believed that in 
prokaryotes transcription and translation take place si- 
multaneously (see Fig. 1.2 in Wagner 12]). It seems ad- 
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rxi 

FIG. 5: Response of FFLs as a function of the spatial ar- 
rangement of the regulatory elements. The transcription fac- 
tors X and Y coherently regulate the expression of operon 
Z; a dashed line indicates a weak cooperative interaction of 
~ 3fcsT, which corresponds to a cooperativity factor lo ~ 20; 
RNAP denotes the enzyme RNA polymerase. On the left the 
different structures: a the expression of gene Z is activated 
by the transcription factor X only; b a 'classical' FFL; c a 
FFL in which the operator regions overlap. On the right, the 
concentration in nM of the expressed protein Z as a function 
of that of the transcription factor X; the inducer for transcrip- 
tion factor Y is assumed to be present at saturating concen- 
trations . It is seen that a FFL can act as an amplifier 
and that overlapping operons can significantly enhance the 
performance of the amplifier. 



vantageous to have transcription factors expressed close 
to the locus at which they are supposed to act. Fur- 
thermore, two operons can be topologically coupled via 
the interplay between transcription and supercoiling |12| . 
This can lead to additional regulatory control, but only 
if the distance between the operons is covered by the 
twin-supercoiled domains. 

Perhaps the most interesting case concerns the preva- 
lence for short distances amongst neighboring operons 
that are transcribed in diverging directions. This spatial 
arrangement of operons offers the possibility that the op- 
erator regions interfere with each other. This can provide 
an extra layer of regulatory control. Just as the existence 
of operons provides for correlated gene expression, inter- 
ference between the regulatory regions for a pair of di- 
verging operons affords additional opportunities for cor- 
related or anticorrelated expression of operons. 

One of the simplest regulatory constructs is a genetic 
switch consisting of two operons that mutually repress 
each other, and elsewhere we will publish a detailed 
analysis of the consequences of correlated and anticor- 
related operon expression for the stability of such toggle 
switches Toggle switches are not a statistically sig- 
nificant motif in the transcriptional regulation network 
of E. coli though, and to demonstrate the effect we turn 
to an example based around a feed- forward loop (FFL). 
Shen-Orr et al jl| demonstrated that FFLs are important 
computational elements of the E. coli network and it is 
believed that they can perform a variety of computational 
tasks in a regulatory circuit. It is believed that they can 
filter transient signals Q, 0] , act as sign-sensitive accel- 
erators or sign-sensitive delays [lflj . or act as an amplifier, 
in which the activity of the gene at the top of the loop is 
amplified at the ultimate target gene |13L fl5| - 
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Our statistical analysis has revealed that in FFLs the 
downstream operons tend to overlap (see Table 0. In or- 
der to investigate the role of overlapping operons in FFLs, 
we have performed a mean- field analysis of FFLs in which 
transcription factors X and Y coherently activate the ex- 
pression of operon Z 0] (see Fig. and supplementary 
material for details). Fig.JSJshows that overlapping oper- 
ons can strongly enhance the sharpness of the response. 
In contrast to the general scheme (Fig. EJj), the gene 
regualtory proteins X and Y can simultaneously activate 
gene Y and gene Z in the overlapping operon scenario 
(Fig. . This allows for extra cooperativity, which in 
turn leads to a sharper response. It would seem that 
the capacity to generate a strong reponse can confer a 
competitive advantage to the organism in a number of 
cases, such as in the repression of sugar-uptake systems 



in response to glucose. Hence, our results suggest that 
regulatory control can provide a selection pressure that 
drives operons together in the course of evolution. 
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Supplementary material - overlapping operons 



operon 1 


start (dirn) end 


operon 2 


start (dirn) end 


sepn 


(sub)set 


interference 


mhpABCDFE 


367835 


(+) 


374105 


mhpR 


366811 




367758 


77 


TRN 


unknown 


soxS 


4274639 




4274962 


soxR 


4275048 


(+) 


4275512 


86 


AR 


yes 


bioA 


807191 


( ^ 


808480 


bioBFCD 


808567 


(+) 


812170 


87 


CR 


yes 


cynTSX 


358023 


(+) 


360370 


cynR 


357015 




357914 


109 


AR 


yes 


lysA 


2975659 




2976921 


lysR 


2977043 


(+) 


2977978 


122 


AR 


yes 


betT 


328687 


(+) 


330720 


betlBA 


324801 




328558 


129 


AR 


yes 


torCAD 


1057307 


(+) 


1061621 


torR 


1056485 




1057177 


130 


AR 


yes 


hcaAlA2CBD-y 


2667052 


(+) 


2671788 


hcaR 


2666026 




2666916 


136 


AR 


unknown 


acrAB 


480478 




484843 


acrR 


484985 


(+) 


485632 


142 


TRN 


unknown 


gyrA 


2334813 


( ^ 


2337440 


ubiG 


2337587 


(+) 


2338309 


147 


none 


no 


cpxP 


4103398 


(+) 


4103900 


cpxAR 


4101183 




4103251 


147 


AR 


unknown 


ilvC 


3955591 


(+) 


3957066 


ilvY 


3954548 


' ^ 


3955441 


150 


AR 


yes 


psp ABODE 


1366103 


(+) 


1368027 


pspF 


1364959 


f ^ 


1365951 


152 


TRN 


yes 


asnA 


3924783 


(+) 


3925775 


asnC 


3924173 




3924631 


152 


AR 


no 


argCBH 


4152580 


(+) 


4155802 


argE 


4151275 


' ^ 


4152426 


154 


CR 


yes 


iclMR 


4220383 


f ^ 


4221246 


metH 


4221407 


(+) 


4225090 


161 


none 


unknown 


malXY 


1697379 


(+) 


1700153 


mall 


1696176 




1697204 


175 


AR & CR 


yes 


rtcAB 


3554044 




3555711 


rtcR 


3555900 


(+) 


3557498 


189 


TRN 


unknown 


yiaKLMNOPQRS 


3740362 


(+) 


3744710 


yiaJ 


3739313 




3740161 


201 


TRN 


yes 


hycABCDEFGH 


2841059 




2848458 


hypA 


2848670 


(+) 


2849020 


212 


CR 


yes 


fliE 


2010722 


'—^ 


2011036 


fliFGHIJK 


2011251 


(+) 


2017535 


215 


CR 


unknown 


dsdXA 


2475867 


(+) 


2478550 


dsdC 


2474714 


f ^ 


2475649 


218 


AR 


no 


zraP 


4198841 




4199266 


hydHG 


4199504 


(+) 


4202223 


238 


TRN 


unknown 


glcDEFGB 


3119650 


( ^ 


3126036 


glcC 


3126287 


(+) 


3127051 


251 


TRN 


yes 


ssb 


4271704 


(+) 


4272240 


uvrA 


4268628 




4271450 


254 


CR 


yes 


fliC 


2000133 




2001629 


fliDST 


2001895 


(+) 


2004101 


266 


CR 


unknown 


glpACB 


2350667 


(+) 


2354731 


glpTQ 


2347955 




2350394 


273 


CR 


yes 


melAB 


4339489 


(+) 


4342368 


melR 


4338298 


' ^ 


4339206 


283 


A Tt (i /nri 

AR & CR 


yes 


rhaT 


4097072 




4098106 


sodA 


4098391 


(+) 


4099011 


285 


none 


no 


rhaBAD 


4091029 


f ^ 


4095029 


rhaSR 


4095317 


(+) 


4097075 


288 


AR 


no 


yhfA 


3483051 




3483455 


crp 


3483757 


(+) 


3484389 


302 


AR 


yes 


rpiB 


4310929 


(+) 


4311378 


rpiR-alsBACE 


4309680 


f ^ 


4310603 


326 


AR 


unknown 


nagE 


703167 


(+) 


705113 


nagBACD 


698797 




702834 


333 


AR & CR 


no 


araBAD 


65855 




70048 


araC 


70387 


(+) 


71265 


339 


AR & CR 


yes 


cxuT 


3242744 


(+) 


3244162 


uxaCA 


3239467 




3242381 


363 


CR 


unknown 


malEFG 


4240205 




4243998 


malK-lamB-ma 


4244363 


(+) 


4248053 


365 


CR 


yes 


xylAB 


3725546 


' — * 


3728394 


xylFGHR 


3728760 


(+) 


3733786 


366 


AR 


no 


entCEBA 


624108 


(+) 


628520 


fepB 


622777 




623733 


375 


CR 


no 


acs 


4282992 




4284950 


nrfABCDEFG 


4285343 


(+) 


4291718 


393 


none 


unknown 


purL 


2689676 


f ^ 


2693563 


yfhD 


2693959 


(+) 


2695377 


396 


none 


unknown 


nadB 


2708440 


(+) 


2710062 


rpoE-rseABC 


2705342 




2708032 


408 


none 


no 


argR 


3382338 


(+) 


3382808 


mdh 


3380965 


' 


3381903 


435 


none 


no 


caiTABCDE 


34781 




41931 


fixABCX 


42367 


(+) 


45750 


436 


CR 


yes 


modABC 


794312 


(+) 


796835 


modE 


793079 


(-) 


793867 


445 


TRN 


unknown 


leuLABCD 


78848 




o o Trio 

83708 


leuO 


84191 


(+) 


85312 


483 


TRN 


unknown 


aroP 


120178 




121551 


pdhR-aceEF-1 


122092 


(+) 


129336 


541 


none 


no 


fucAO 


2929887 




2931710 


fucPIKUR 


2932257 


(+) 


2938121 


547 


AR & CR 


no 


malPQ 


3545619 




3550106 


malT 


3550718 


(+) 


3553423 


612 


TRN 


no 


gltA 


752408 




753691 


sdhCDAB-b072 


754400 


(+) 


764272 


709 


CR 


yes 


csgBA 


1103174 


(+) 


1104125 


csgDEFG 


1100074 




1102419 


755 


AR & CR 


no 


flgBCDEFGHIJ 


1130241 


(+) 


1139244 


flgMN 


1128637 




1129351 


890 


CR 


unknown 


glpD 


3559646 


(+) 


3561151 


glpR 


3557480 




3558238 


1408 


TRN 


yes 


narK 


1277180 


(+) 


1278571 


narL 


1274402 




1275052 


2128 


TRN 


yes 



Included are all diverging neighbour pairs with less than 1 kbp separation between coding regions, or with detected 
interference between operator regions if separation is greater than 1 kbp. Operon names have been truncated to 12 
characters to save space. All the AR & CR pairs are also examples of 'downstream' operons in feed-forward loops. 
For TRN and AR pairs, operon 1 is the controlled operon and operon 2 is the controlling operon. 
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Supplementary material - feed-forward loops 



We follow the approach of Shea and Ackers lid and Buchler et al. [12| in modelling gene expression. The approach 
relies on the idea of "regulated recruitment" |l§t Il9| : gene regulatory proteins control gene expression by modulating 
the probability P that the enzyme RNA polymerase is bound to the DNA; if the RNA polymerase is bound, then 
it is assumed that gene expression occurs at a fixed rate (3. The macroscopic rate equation for the synthesis and 
degradation of a protein Z is thus given by: 



d t Z ] RP [71 



(1) 



Here [Z] is the concentration of protein Z, Pz is the probability that the RNA polymerase is bound to the promoter 
for gene Z and n is the degradation rate of the protein. The probability Pz that a RNA polymerase is bound to the 
promoter of gene Z is given by 



Pi, = 



Zz,o 



Zz 



off 



(2) 



Here Zz,on and Zz, Q s are the partition functions for the system with the RNA polymerase bound and not bound to 
the promoter of gene Z, respectively. Following Buchler et al. [17j. we characterize the interaction between a pair of 
proteins - a protein being either a RNA polymerase or a transcription factor - by a cooperativity factor u>. A weak 
glue-like interaction of 3fceT » 2kcal/mol is assumed ^IHiEl; which corresponds to a cooperativity factor u> w 20. 
Furhermore, we assume that if gene expression is controlled by two gene regulatory proteins, the RNA polymerase 
can contact both proteins simultaneously. We have also considered the independent interaction model, in which the 
RNA polymerase can only interact with one gene regulatory protein at the time [TtI ] . We obtained similar conclusions 
for the two models. 

We now consider the structures shown in Fig. of the supplementary material. 
Structure a: 



Zz, of f = i + pq/tfx 

^Z.on — 

[RNAP]/A RNAP (1 



■u[X]/K x ) 



Structure b: 



Structure c: 



Zy, 3 
Zy.on 
Zz,o« 
Zz,on 

Zy,o« 
Zy,oii 
Zz,oS 
Zz.on 



1 + [X]/A x 

[RNAP]/ -Krnap (1 + wpC]/tfx) 

1 + [X]/A x + [Y]/K Y + u[X][Y]/(K x K Y ) 

[RNAP]/A RNA p (1 + u[X]/Kx + to[Y]/K Y - 

1 + [X]/A x + [Y]/A' Y + u[X][Y}/(K x K Y ) 
[RNAP]/A R nap (1 + u;[X}/K x + lu[Y}/Ky - 
Zy,oS 

Zyon 



c 3 [X][Y]/(A x A Y )) 



■w 3 [X}[Y}/(K x Ky)) 



It is seen that the expression of Y depends on the concentration of Y. This means that in order to obtain the 
concentration of Y, we have to solve a quadratic equation in [Y). This structure is included to show the effect of the 
autoregulatory loop on Y - the transcription factor Y interacts with the RNA polymerase bound to the promoter for 
gene Y. 

Structure d: 

Z Y ,off = 1 + [X]/K x + [Y]/Ky + u[X][Y]/(K x Ky) + [RNAP]/A RNAP + ^[X][RNAP]/(A x A RNA p) 

+uj [Y] [RNAP] / ( Ky A RNAP ) + [X] [Y] [RNAP] / ( K x Ky A RNA p ) 
Z Y ,o„ = [RNAP]/Ar NAP (1 + wpq/tfx + u\Y\/K Y + [RNAP]/A RNA p + 

lu 3 [X] [Y] / (K x Ky ) + lu 2 [X] [RNAP] / ( K x A RNAP ) + uj 2 [Y] [RNAP] / ( Ky A RNAP ) 



s [X] [Y] [RNAP] / ( K x K Y A RNAP ) ) 



Z, 



Z.ott 



^Y.off 
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y "*| RNAP 






RNAP U 1 












FIG. 6: Response of FFLs as a function of the spatial arrangement of the regulatory elements. A dashed line indicates a weak 
cooperative interaction of « 3fcsT, which corresponds to a cooperativity factor uj ~ 20. On the left, the different strucutres: a 
the expression of gene Z is activated by the transcription factor X only; b a "classical" FFL; c a FFL with an autoregulatory 
loop on Y; d a FFL in which the operator regions overlap. On the right, the concentration of the expressed protein Z in nM 
as a function of that of the transcription factor X. Note that structure d corresponds to structure c of Fig. |S|of the main text. 
It is seen that a FFL can act as an amplifier and that overlapping operons can significantly enhance the performance of the 
amplifier. 



Note that this structure corresponds to structure c of Fig. in the main text. 

We have taken Kx = K\ = -Krnap = 1000 nM, where, for E. coli, InM corresponds to roughly one molecule 
per cell. Note that structure c of Fig. [S] of the main text corresponds to structure d of Fig. of the supplementary 
material. 

We have also performed an extra set of calculations, in which there is no direct interaction between the transcription 
factor Y and the RNA polymerase bound to the promoter for gene Y. For all sets of calculations, we found that the 
structure in which the operator regions overlap (i.e. structure c of Fig. [S] of the main text and structure d of Fig.[S] 
of the supplementary material) gives the sharpest response. 



